Decision Trees: More Theoretical Justification

نویسندگان

  • Amos Fiat
  • Dmitry Pechyony
چکیده

We study impurity-based decision tree algorithms such as CART, C4.5, etc., so as to better understand their theoretical underpinnings. We consider such algorithms on special forms of functions and distributions. We deal with the uniform distribution and functions that can be described as unate functions, linear threshold functions and readonce DNF. For unate functions we show that that maximal purity gain and maximal influence are logically equivalent. This leads us to the exact identification of unate functions by impurity-based algorithms given sufficiently many noise-free examples. We show that for such class of functions these algorithms build minimal height decision trees. Then we show that if the unate function is a read-once DNF or a linear threshold functions then the decision tree resulting from these algorithms has the minimal number of nodes amongst all decision trees representing the function. Based on the statistical query learning model, we introduce the noisetolerant version of practical decision tree algorithms. We show that when the input examples have small classification noise and are uniformly distributed, then all our results for practical noise-free impurity-based algorithms also hold for their noise-tolerant version.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TEL-AVIV UNIVERSITY RAYMOND AND BEVERLY SACKLER FACULTY OF EXACT SCIENCES SCHOOL OF COMPUTER SCIENCE Decision Trees: More Theoretical Justification for Practical Algorithms

We study impurity-based decision tree algorithms such as CART, C4.5, etc., so as to better understand their theoretical underpinnings. We consider such algorithms on special forms of functions and distributions. We deal with the uniform distribution and functions that can be described as a boolean linear threshold functions and a read-once DNF. We show that for boolean linear threshold function...

متن کامل

Intuition and the junctures of judgment in decision procedures for clinical ethics.

Moral decision procedures such as principlism or casuistry require intuition at certain junctures, as when a principle seems indeterminate, or principles conflict, or we wonder which paradigm case is most relevantly similar to the instant case. However, intuitions are widely thought to lack epistemic justification, and many ethicists urge that such decision procedures dispense with intuition in...

متن کامل

Decision Trees: More Theoretical Justification for Practical Algorithms

We study impurity-based decision tree algorithms such as CART, C4.5, etc., so as to better understand their theoretical underpinnings. We consider such algorithms on special forms of functions and distributions. We deal with the uniform distribution and functions that can be described as a boolean linear threshold functions or a read-once DNF. We show that for boolean linear threshold functions...

متن کامل

Rule Extraction from Ensemble Methods Using Aggregated Decision Trees

Ensemble methods have become very well known for being powerful pattern recognition algorithms capable of achieving high accuracy. However, Ensemble methods produces learners that are not comprehensible or transferable thus making them unsuitable for tasks that require a rational justification for making a decision. Rule Extraction methods can resolve this limitation by extracting comprehensibl...

متن کامل

Boosting with Multi-Way Branching in Decision Trees

It is known that decision tree learning can be viewed as a form of boosting. However, existing boosting theorems for decision tree learning allow only binary-branching trees and the generalization to multi-branching trees is not immediate. Practical decision tree algorithms, such as CART and C4.5, implement a trade-off between the number of branches and the improvement in tree quality as measur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007